53 research outputs found

    Recovering Fine Details for Neural Implicit Surface Reconstruction

    Full text link
    Recent works on implicit neural representations have made significant strides. Learning implicit neural surfaces using volume rendering has gained popularity in multi-view reconstruction without 3D supervision. However, accurately recovering fine details is still challenging, due to the underlying ambiguity of geometry and appearance representation. In this paper, we present D-NeuS, a volume rendering-base neural implicit surface reconstruction method capable to recover fine geometry details, which extends NeuS by two additional loss functions targeting enhanced reconstruction quality. First, we encourage the rendered surface points from alpha compositing to have zero signed distance values, alleviating the geometry bias arising from transforming SDF to density for volume rendering. Second, we impose multi-view feature consistency on the surface points, derived by interpolating SDF zero-crossings from sampled points along rays. Extensive quantitative and qualitative results demonstrate that our method reconstructs high-accuracy surfaces with details, and outperforms the state of the art

    RUSHES—an annotation and retrieval engine for multimedia semantic units

    Get PDF
    Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Supporting linguistic research using generic automatic audio/video analysis

    Get PDF
    Automatic analysis can speed up the annotation process and free up human resources, which can then be spent on theorizing instead of tedious annotation tasks. We will describe selected automatic tools that support the most time-consuming steps in annotation, such as speech and speaker segmentation, time alignment of existing transcripts, automatic scene analysis with respect to camera motion, face/person detection, and the tracking of head and hands as well as the resulting gesture analysis.National Foreign Language Resource Cente

    Image-based rendering for teleconference systems

    Get PDF
    To obtain an image-based immersive presence in a virtual world, two important factors should be considered: system configuration and multiview representation. We present two non-adversary system configurations. The first is the well-known convergent wide-baseline set-up while the second is a unique proposal under investigation at our institute, which is based around a parallel multiple narrow-baseline camera set-up. In the domain of multiview representation we introduce two non-conflicting representations that can be implemented independent of the chosen system configuration, dependent on whether compression or scalability is important to the overall system. We then discuss our implementation of an image-based rendering system for an immersive teleconferencing application where three conferees meet around a shared virtual table. The system uses a wide-baseline configuration with two stereo camera pairs capturing the reference images. The system is designed to deal with hand gestures as well as the synthesis of areas occluded in one or more of the reference images but required in the derived view. We introduce the notion of a confidence map designed to indicate, for the derived image, which reference image should provide the required texture and disparity information for a surface

    ACM multimedia 2010 workshop on 3D video processing

    No full text
    Research on 3D video processing has gained a tremendous amount of momentum due to advances in video communications, broadcasting and entertainment technology (e.g., animation blockbusters like Avatar and Up). There is an increasing need for reliable technologies capable of visualizing 3-D content from viewpoints decided by the user; the 2010 football World Cup in South Africa has made very evident the need to replay crucial football footage from new viewpoints to decide whether the ball has or has not crossed the goal line. Remote videoconferencing prototypes are introducing a sense of presence into large- and small-scale (PC-based) systems alike by manipulating single and multiple video sequences to improve eye contact and place participants in convincing virtual spaces. All this, and more, is pushing the introduction of 3D services and the development of high-quality 3D displays to be available in a future which is drawing nearer and nearer

    IMAGE-BASED RENDERING FOR TELECONFERENCE SYSTEMS

    No full text
    To obtain an image-based immersive presence in a virtual world, two important factors should be considered: system configuration and multiview representation. We present two non-adversary system configurations. The first is the well-known convergent wide-baseline set-up while the second is a unique proposal under investigation at our institute, which is based around a parallel multiple narrow-baseline camera set-up. In the domain of multiview representation we introduce two non-conflicting representations that can be implemented independent of the chosen system configuration, dependent on whether compression or scalability is important to the overall system. We then discuss our implementation of an image-based rendering system for an immersive teleconferencing application where three conferees meet around a shared virtual table. The system uses a wide-baseline configuration with two stereo camera pairs capturing the reference images. The system is designed to deal with hand gestures as well as the synthesis of areas occluded in one or more of the reference images but required in the derived view. We introduce the notion of a confidence map designed to indicate, for the derived image, which reference image should provide the required texture and disparity information for a surface

    MULTIPLE NARROW-BASELINE SYSTEM FOR IMMERSIVE TELECONFERENCING

    No full text
    Abstract: An important aim of immersive teleconferencing systems is to create realistic 3D virtual views of remote conferees. Hence, systems should be able to deal with hand gestures as well as occluded areas in reference images required in derived views. The quality of such derived views is dependent not only on the analysis and synthesis process but also the multiview camera set-up. Often the popular convergent wide-baseline stereo approach aspires to achieve too much through a single camera pair: maximum information and reliable disparity maps. We identify how this dichotomy leads to problems in the analysis and synthesis process, often leading to a restrictive system specific solution. We then define a new approach, a multiple narrow-baseline set-up, designed to overcome the limitations of the wide-baseline set-up, being modular, both in terms of system requirements as well as algorithmically, and scalable, with respect to the number of conferees. Key words: multiple narrow-baseline, immersive teleconferencing, confidence map. 1
    corecore